perm filename DIVERS[W87,JMC] blob
sn#832538 filedate 1987-01-19 generic text, type C, neo UTF8
COMMENT ā VALID 00002 PAGES
C REC PAGE DESCRIPTION
C00001 00001
C00002 00002 ā19-Jan-87 0035 edsel!bhopal!jonl@navajo.stanford.edu Text of the "Diversion" proposal
C00019 ENDMK
Cā;
ā19-Jan-87 0035 edsel!bhopal!jonl@navajo.stanford.edu Text of the "Diversion" proposal
Received: from NAVAJO.STANFORD.EDU by SAIL.STANFORD.EDU with TCP; 19 Jan 87 00:35:39 PST
Received: by navajo.stanford.edu; Mon, 19 Jan 87 00:34:43 PST
Received: from bhopal.edsel.com by edsel.uucp (2.2/SMI-2.0)
id AA02969; Mon, 19 Jan 87 00:23:27 pst
Received: by bhopal.edsel.com (3.2/SMI-3.2)
id AA02362; Mon, 19 Jan 87 00:23:10 PST
Date: Mon, 19 Jan 87 00:23:10 PST
From: edsel!bhopal!jonl@navajo.stanford.edu (Jon L White)
Message-Id: <8701190823.AA02362@bhopal.edsel.com>
To: navajo!jmc%su-ai@navajo.stanford.edu
Subject: Text of the "Diversion" proposal
Distributed Multi-processing for Qlisp
INTRODUCTION
We propose to implement a message-passing based distributed Lisp
environment as a preliminary step towards the full multi-processor
implementation of a multi-processing Qlisp. There will be a 'master'
processor, which will send tasks to 'slave' processors, and then
coordinate their return values. Communication between the processors
will only be via this message-sending and receiving mechanism, and main
memory will not be "shared" amongst them. In effect, a process will be
assigned to exactly one processor, which must complete it and send a
result or a failure code back to its caller. This differs from the
Qlisp model in that that a process in Qlisp may migrate to any available
processor, and in that it will be implemented using the Common Lisp
semantics of CATCH rather than the Qlisp semantics.
MESSAGE FORMATS
Initially, "Messages" will be encoded as ascii strings to be read in
by the standard Lisp READ function, and evaluated. The communication
channel will be a pipe of whatever sort we can efficiently implement
on our machine (it could be an ethernet link too, but for the Alliant,
we expect to use some small number of shared pages for a much faster
pipe between the several processors).
We may consider developing a more specific Remote Procedure Protocol.
That is, instead of merely sending the text for forms to be evaluated,
we will devise a "record structure" which can be more quickly encoded
and decoded on either side. We have had some experience with the
Courier RPC formats in standard use by the Xerox Network Systems
protocols, and with the research prototype called "Cedar RPC" (also
developed by Xerox Palo Alto Research Center).
THE PROCESSORS -- Master and Slave
Each processor will be running an experimental multi-processing
version of Lucid Common Lisp. Although Lucid's product does not
yet have multi-processing capability, a primitive implementation
of so-called "stack groups" has been tested in-house for some
time now, and can provide the rudiments of a Lisp-based multi-
processing environment (note: "multi-processing", not "multi-
processor"). It currently does pre-emptive time-slicing (albeit
with a "dumb" scheduler algorithm, but we are not at this time
overly concerned with time-sharing efficiency), and provides
the capability for user-written code to be attached to certain
interrupt conditions.
We will need a multi-processing capability in order to implement
the main tasks and the communication-channel tasks all in Lisp.
Furthermore,
(1) When the master is interpreting a QLET, it will perhaps
apportion out jobs to several slaves to evaluate some
"arguments", subject to the restriction that the slave has
only a local environment and can only communicate with the
master via the "message-passing" channel. At least one
process will be required for the Lisp part of the
communication channel (and ideally, each communication would
spawn a separate process, which would be reclaimed when that
particular task request is completed); and at least one other
process will be necessary for the limited Qlisp interpreter.
(2) When a slave is running one task, it may receive yet another
service request. This will happen because the 'master' will
have no exact idea of just how many processors there are, so
will have to apportion out its workload based on something
other than the "one worker, one job" rule. But even if the
master were to know exactly how many processors are available,
there will always be situations in which processes will be spawned
faster than they can be completed, and the "backlog" will be have
to be apportioned out among the 'slaves'. In a real Qlisp, however,
the "backlog" would merely be appended to a central queue. Of
course, a slave also needs at least one dedicated task for the
communication channel service; we would expect also to have a
"dispatcher" task whose job is to take service requests off a
pending queue, and actually initiate the Lisp-based process which
will perform the service request.
We forsee no immediate problem in being able to run at least a few tens
of processes in each one the Lucid Common Lisp images; this should be
enough to support some reasonable programming experiments.
PROTOCOLS
Each task request will carry a unique identification tag, so that
communication for several tasks may be multiplexed over the one
communication channel between 'master' and 'slave' [in one sense,
this is merely a cheap way of setting up unique communication channels].
The id tag will be generated by the 'slave' when it accepts a task from
the 'master', and the immediate communication back to the 'master' of
the task id will be confirmation of the assignment. These tags will not
only serve a normal communications purpose, but will also aid in
monitoring and debugging the whole design.
We intend the communications to be performed as low-overhead, interrupt
driven protocols. While a 'slave' is multi-processing among its current
tasks, interrupt level operations will append new requests to a queue
of current "services", and they may even insert process-deletion requests
into it which will cause the abnormal termination of some already
running task. We also intend the "transmittal" and "receipt" of these
messages to be low-overhead in the amount of processing required;
for example, "receipt" will be little more than
(1) READing the input text, and parsing it into a trivial RPC
format that admits at most a few operations.
(2) assuming that the RPC request is for a new task assignment,
then generate a new task request with a unique id and post it
into the queue of all service requests; its initial [runnable]
state is merely a form to be evaluated, which is constructed
from the input text.
(3) send back to the caller [the 'master'?] confirmation of the
action requested.
These are all bounded, and relatively short, operations; we will
view them as the atomic steps of our communication protocol. This
way, there will be no critical sections in the communications, and
we will not need to address problems arising out of any additional
asynchrony caused by communication delays. In the real Qlisp model,
communication is "instantaneous" because of having a single address
space for all the processors, and because of the hardware interlocks
in a properly-shared memory bus [such as bus arbiters at the electrical
level, and "test-and-set", or "compare-and-swap" instructions at the
software level].
Normal completion of a task will also include sending the results back
to the 'master'; the unique id tag serves to dispatch the result to
the right "place" in the 'master'. Note however that killing a running
process will not be a time-bounded request, since it will typically
involve running the "clean-up" forms of unwind-protect clauses in the
task being killed. We are not currently contemplating "return values"
or "arguments" which are themselves of extremely large or unbounded size
[in which case there would need to be the capability for out-of-band
transmission, somewhat akin to the data-stream used for for a FTP
file transfer that is initiated by an "in-band" TELNET connection].
The RPC request types we anticipate are:
(1) New service request -- item (2) above;
(2) Status request, for a previously-established service request;
(3) Abnormal termination of a previously-established service request;
(4) Overall Status request -- such as reporting on the total number of
outstanding service requests;
If some of the embellishments described below are taken, then we may
also want:
(5) Re-order the queue, based on chaning priorities of the outstanding
service requests.
POTENTIAL EMBELLISHMENTS
We may find ourselves in the position of wanting to drop the
master/slave distinction, and expand the experiment with all
processors on equal footing. There would then arise simple questions
like "credentials" -- which processor or process is allowed to
requisition which resources (such as other processors, or more and
more spawned processes). The more complex questions would that would
arise have to do with "load balancing", and with the avoidance of
"mindless parallelism" wherein all available resources are consumed by
a breadth-first expansion of processes before some essential computation
is permitted to run (but which would have been run had the parallelism
been expanded depth-first). We explicitly do not want to investigate
load-balancing or resource-scheduling in general now; but we do
anticipate to need to look at the mechanisms required to support
CATCH in Qlisp, where a minimal amount of "process introspection"
seems to be necessary.
CODICIL
Possibly we may use this model to stage experiments in programming
in Qlisp. We need to know more about how particular programming
styles will be affected by the Qlisp parallelism; or put another way,
we want to know how to write programs which effectively utilize the
features of Qlisp. Among the kinds of questions for which we seek
pragmatic data as an aid to understanding are (1) what are the
bottle necks? are there too many processes active at some point in
time? (2) what is the degree of parallelism? will the QLET construct
be adequate for expressing the need to keep many processors busy?
what kinds of programs will be forced into an inherently serial mode?
(3) what problems will we encounter when trying to "kill" all the
processes spawened "underneath" a Qlisp CATCH or QCATCH? will we have
a "garbage-collection of processes" problem?